PCA on face data

library(dplyr)
library(R.matlab)
library(imager)

Data

Load the example dataset (from ex7faces.mat which is a matlab file) which containsa large set of example faces, each 32px by 32px (1024 pixels total per face), and saved in the grayscale format.

x <- readMat("ex7faces.mat")
x %>% str()
List of 1
 $ X: num [1:5000, 1:1024] -37.87 8.13 -32.87 -84.87 2.13 ...
 - attr(*, "header")=List of 3
  ..$ description: chr "MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Nov 14 23:46:35 2011                                                "
  ..$ version    : chr "5"
  ..$ endian     : chr "little"
x <- data.frame(x)
x %>% dim
[1] 5000 1024
x[1:6, 1:6]
         X.1         X.2        X.3        X.4        X.5       X.6
1 -37.866314 -45.8663139 -53.866314 -51.866314 -40.866314 -33.86631
2   8.133686  -0.8663139  -8.866314 -15.866314 -17.866314 -16.86631
3 -32.866314 -34.8663139 -36.866314 -18.866314   6.133686  15.13369
4 -84.866314 -64.8663139 -47.866314 -42.866314 -38.866314 -28.86631
5   2.133686   6.1336861   5.133686   9.133686  10.133686  11.13369
6  60.133686  58.1336861  60.133686  59.133686  56.133686  41.13369

Visualize the example dataset (display the first 100 faces).

n <- nrow(x)
p <- ncol(x)
npix <- sqrt(p)

v <- 100

ll <- vector("list", v)


# (x[1,] %>% as.numeric()) %>% matrix(npix, npix) %>% head
x[1, ] %>% as.numeric() %>% matrix(npix, npix) %>% apply(2, range) %>% range
[1] -123.86631   75.13369
x[1, ] %>% as.numeric() %>% matrix(npix, npix) %>% apply(1, range) %>% range
[1] -123.86631   75.13369
as.cimg(x[1, ] %>% as.numeric(), x = npix, y = npix)
Image. Width: 32 pix Height: 32 pix Depth: 1 Colour channels: 1 

for (i in 1:v) {
    ll[[i]] <- as.cimg(x[i, ] %>% as.numeric, x = npix, y = npix)
    ll[[i]][, , 1, 1] <- t(ll[[i]][, , 1, 1])
}

par(mfrow = c(sqrt(v), sqrt(v)), mar = c(0, 0, 0.5, 0), bg = "darkslategray")
for (i in 1:v) plot(ll[[i]], axes = F)

PCA

  1. Evaluate whether a PCA can be an affective method for a dimensionality reduction. In case, run appropriately a PCA and visualize the eigenvectors which are in this case eigenfaces.
x %>% ...
x %>% colMeans() %>% summary
x %>% var %>% diag %>% sqrt %>% summary
pca <- ... (... ... ...)
  1. Which is the dimension of the eigenfaces matrix?
pca$...
[1] 1024 1024
  1. Then, we can visualize the eigenfaces as images (let display the first 36).
m <- 36

eigenfaces <- ...

...
...
...
...
...

  1. How do you interpret them?
  • Why do they appear as ghost-like faces?
  • Do you recognize block-type and difference-type eigenfaces?

Dimension reduction

  1. What dimension k should a lower dimensional vector subspace have in order to still accurately represent the original images?

Some useful tools are shown below.

                             PC2       PC3
Standard deviation     481.22781 334.67523
Proportion of Variance   0.13705   0.06629
Cumulative Proportion    0.43980   0.50609
                            PC18      PC19
Standard deviation     106.96036 106.32226
Proportion of Variance   0.00677   0.00669
Cumulative Proportion    0.74998   0.75667
                           PC67     PC68
Standard deviation     48.88754 48.10736
Proportion of Variance  0.00141  0.00137
Cumulative Proportion   0.89903  0.90040
                           PC90     PC91
Standard deviation     39.17799 38.64754
Proportion of Variance  0.00091  0.00088
Cumulative Proportion   0.92467  0.92555
                          PC128    PC129
Standard deviation     28.99461 28.71160
Proportion of Variance  0.00050  0.00049
Cumulative Proportion   0.94996  0.95045
[1] 1650.134
                           PC85     PC86
Standard deviation     40.95566 40.17880
Proportion of Variance  0.00099  0.00096
Cumulative Proportion   0.92000  0.92095

Rebuilding faces

Now that we have the eigenfaces, we can project our original faces onto a subset of k of them, thus reducing each image from n-dimensions down to a vector z of k -dimensions.

Let’s project our data onto k=100 eigenfaces (pca$x exists if you used prcomp above, otherwise you can easily compute z scores by a matrix product),

k <- 100
z <- pca$x[, 1:k]

and then rebuild the original faces using the eigenfaces and z encodings for each image,

x_approx <- z %*% ... + ...
x_approx %>% dim
[1] 5000 1024

resulting in the images below (consider just the first 100 faces for a comparison with the original ones above):

By using less than 10/% of the original dimensions we are able to reconstruct pretty well the original pictures.

2018-12-03